{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Build Your Own Operator on macOS - Part 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "Welcome to Part 1 of our tutorial series on building a Computer Use Automation (CUA) operator using OpenAI. For the complete guide, check out our [full blog post](https://cua.ai/blog/build-your-own-operator-on-macos-1)."
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll learn how to combine the OpenAI Responses API (using the `computer-use-preview` model) with the CUA macOS interface to automate tasks. Instead of using Playwright like in the original [OpenAI CUA docs](https://platform.openai.com/docs/guides/tools-computer-use), we use the CUA computer to control a macOS sandbox (via the Lume CLI) and execute actions such as clicking and typing. The loop is as follows:\n",
    "\n",
    "1. **Initialize the CUA sandbox** using the `cua-computer` py package.\n",
    "2. **Capture an initial screenshot** of the sandbox.\n",
    "3. **Send the screenshot and a user prompt** to the OpenAI Responses API.\n",
    "4. **Receive a `computer_call` action** from the API.\n",
    "5. **Map and execute the action** on the CUA interface (e.g. move the cursor and click, type text, etc.).\n",
    "6. **Capture a new screenshot** and repeat the loop as needed.\n",
    "\n",
    "Note: For this example, you must have your OpenAI API key set up and the Lume daemon running with a downloaded macOS VM image (see installation instructions in the CUA documentation)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "- Install the `cua-computer` package and set up the Lume daemon as described in its documentation.\n",
    "- Ensure you have an OpenAI API key (set as an environment variable or in your OpenAI configuration).\n",
    "- This notebook uses asynchronous Python (async/await)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install the required packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If running outside of the monorepo:\n",
    "# %pip install \"cua-computer[all]\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import required modules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import asyncio\n",
    "import base64\n",
    "import openai\n",
    "\n",
    "from computer import Computer\n",
    "\n",
    "# Ensure your OpenAI API key is set\n",
    "openai.api_key = input(\"Enter your OpenAI API key: \")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Mapping OpenAI Actions to CUA Methods\n",
    "\n",
    "The following helper function converts a `computer_call` action from the OpenAI Responses API into corresponding commands on the CUA interface. For example, if the API instructs a `click` action, we move the cursor and perform a left click on the Cua Sandbox. We will use the computer interface to execute the actions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "async def execute_action(computer, action):\n",
    "    action_type = action.type\n",
    "\n",
    "    if action_type == \"click\":\n",
    "        x = action.x\n",
    "        y = action.y\n",
    "        button = action.button\n",
    "        print(f\"Executing click at ({x}, {y}) with button '{button}'\")\n",
    "        await computer.interface.move_cursor(x, y)\n",
    "        if button == \"right\":\n",
    "            await computer.interface.right_click()\n",
    "        else:\n",
    "            await computer.interface.left_click()\n",
    "\n",
    "    elif action_type == \"type\":\n",
    "        text = action.text\n",
    "        print(f\"Typing text: {text}\")\n",
    "        await computer.interface.type_text(text)\n",
    "\n",
    "    elif action_type == \"scroll\":\n",
    "        x = action.x\n",
    "        y = action.y\n",
    "        scroll_x = action.scroll_x\n",
    "        scroll_y = action.scroll_y\n",
    "        print(f\"Scrolling at ({x}, {y}) with offsets (scroll_x={scroll_x}, scroll_y={scroll_y})\")\n",
    "        await computer.interface.move_cursor(x, y)\n",
    "        await computer.interface.scroll(scroll_y)  # Assuming CUA provides a scroll method\n",
    "\n",
    "    elif action_type == \"keypress\":\n",
    "        keys = action.keys\n",
    "        for key in keys:\n",
    "            print(f\"Pressing key: {key}\")\n",
    "            # Map common key names to CUA equivalents\n",
    "            if key.lower() == \"enter\":\n",
    "                await computer.interface.press_key(\"return\")\n",
    "            elif key.lower() == \"space\":\n",
    "                await computer.interface.press_key(\"space\")\n",
    "            else:\n",
    "                await computer.interface.press_key(key)\n",
    "\n",
    "    elif action_type == \"wait\":\n",
    "        print(\"Waiting for 2 seconds\")\n",
    "        await asyncio.sleep(2)\n",
    "\n",
    "    elif action_type == \"screenshot\":\n",
    "        print(\"Taking screenshot\")\n",
    "        # This is handled automatically in the main loop, but we can take an extra one if requested\n",
    "        screenshot = await computer.interface.screenshot()\n",
    "        return screenshot\n",
    "\n",
    "    else:\n",
    "        print(f\"Unrecognized action: {action_type}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The CUA/OpenAI Loop\n",
    "\n",
    "This cell defines a loop that:\n",
    "\n",
    "1. Initializes the CUA computer instance (connecting to a macOS sandbox).\n",
    "2. Captures a screenshot of the current state.\n",
    "3. Sends the screenshot (with a user prompt) to the OpenAI Responses API using the `computer-use-preview` tool.\n",
    "4. Processes the returned `computer_call` action and executes it using our helper function.\n",
    "5. Captures an updated screenshot after the action (this example runs one iteration, but you can wrap it in a loop).\n",
    "\n",
    "For a full loop, you would repeat these steps until no further actions are returned."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "async def cua_openai_loop():\n",
    "    # Initialize the CUA computer instance (macOS sandbox)\n",
    "    async with Computer(display=\"1024x768\", memory=\"4GB\", cpu=\"2\", os_type=\"macos\") as computer:\n",
    "        await computer.run()\n",
    "\n",
    "        # Capture the initial screenshot\n",
    "        screenshot = await computer.interface.screenshot()\n",
    "        screenshot_base64 = base64.b64encode(screenshot).decode(\"utf-8\")\n",
    "\n",
    "        # Initial request to start the loop\n",
    "        response = openai.responses.create(\n",
    "            model=\"computer-use-preview\",\n",
    "            tools=[\n",
    "                {\n",
    "                    \"type\": \"computer_use_preview\",\n",
    "                    \"display_width\": 1024,\n",
    "                    \"display_height\": 768,\n",
    "                    \"environment\": \"mac\",\n",
    "                }\n",
    "            ],\n",
    "            input=[\n",
    "                {  # type: ignore\n",
    "                    \"role\": \"user\",\n",
    "                    \"content\": [\n",
    "                        {\"type\": \"input_text\", \"text\": \"Open Safari, download and install Cursor.\"},\n",
    "                        {\n",
    "                            \"type\": \"input_image\",\n",
    "                            \"image_url\": f\"data:image/png;base64,{screenshot_base64}\",\n",
    "                        },\n",
    "                    ],\n",
    "                }\n",
    "            ],\n",
    "            truncation=\"auto\",\n",
    "        )\n",
    "\n",
    "        # Continue the loop until no more computer_call actions\n",
    "        while True:\n",
    "            # Check for computer_call actions\n",
    "            computer_calls = [\n",
    "                item for item in response.output if item and item.type == \"computer_call\"\n",
    "            ]\n",
    "            if not computer_calls:\n",
    "                print(\"No more computer calls. Loop complete.\")\n",
    "                break\n",
    "\n",
    "            # Get the first computer call\n",
    "            call = computer_calls[0]\n",
    "            last_call_id = call.call_id\n",
    "            action = call.action\n",
    "            print(\"Received action from OpenAI Responses API:\", action)\n",
    "\n",
    "            # Handle any pending safety checks\n",
    "            if call.pending_safety_checks:\n",
    "                print(\"Safety checks pending:\", call.pending_safety_checks)\n",
    "                # In a real implementation, you would want to get user confirmation here\n",
    "                acknowledged_checks = call.pending_safety_checks\n",
    "            else:\n",
    "                acknowledged_checks = []\n",
    "\n",
    "            # Execute the action\n",
    "            await execute_action(computer, action)\n",
    "            await asyncio.sleep(1)  # Allow time for changes to take effect\n",
    "\n",
    "            # Capture new screenshot after action\n",
    "            new_screenshot = await computer.interface.screenshot()\n",
    "            new_screenshot_base64 = base64.b64encode(new_screenshot).decode(\"utf-8\")\n",
    "\n",
    "            # Send the screenshot back as computer_call_output\n",
    "            response = openai.responses.create(\n",
    "                model=\"computer-use-preview\",\n",
    "                previous_response_id=response.id,  # Link to previous response\n",
    "                tools=[\n",
    "                    {\n",
    "                        \"type\": \"computer_use_preview\",\n",
    "                        \"display_width\": 1024,\n",
    "                        \"display_height\": 768,\n",
    "                        \"environment\": \"mac\",\n",
    "                    }\n",
    "                ],\n",
    "                input=[\n",
    "                    {  # type: ignore\n",
    "                        \"type\": \"computer_call_output\",\n",
    "                        \"call_id\": last_call_id,\n",
    "                        \"acknowledged_safety_checks\": acknowledged_checks,\n",
    "                        \"output\": {\n",
    "                            \"type\": \"input_image\",\n",
    "                            \"image_url\": f\"data:image/png;base64,{new_screenshot_base64}\",\n",
    "                        },\n",
    "                    }\n",
    "                ],\n",
    "                truncation=\"auto\",\n",
    "            )\n",
    "\n",
    "        # End the session\n",
    "        await computer.stop()\n",
    "\n",
    "\n",
    "# Run the loop\n",
    "await cua_openai_loop()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Final Remarks\n",
    "\n",
    "This notebook demonstrates a single iteration of a CUA/OpenAI loop where:\n",
    "\n",
    "- A macOS sandbox is controlled using the CUA interface.\n",
    "- A screenshot and prompt are sent to the OpenAI Responses API.\n",
    "- The returned action (e.g. a click or type command) is executed via the CUA interface.\n",
    "\n",
    "In a production setting, you would wrap the action-response cycle in a loop, handling multiple actions and safety checks as needed."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cua",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}